- Title
- A Proposed Framework for Early Prediction of Schistosomiasis
- Creator
- Ali, Zain; Hayat, Muhammad Faisal; Shaukat, Kamran; Alam, Talha Mahboob; Hameed, Ibrahim A.; Luo, Suhuai; Basheer, Shakila; Ayadi, Manel; Ksibi, Amel
- Relation
- Diagnostics Vol. 12, Issue 12, no. 3138
- Publisher Link
- http://dx.doi.org/10.3390/diagnostics12123138
- Publisher
- MDPI AG
- Resource Type
- journal article
- Date
- 2022
- Description
- Schistosomiasis is a neglected tropical disease that continues to be a leading cause of illness and mortality around the globe. The causing parasites are affixed to the skin through defiled water and enter the human body. Failure to diagnose Schistosomiasis can result in various medical complications, such as ascites, portal hypertension, esophageal varices, splenomegaly, and growth retardation. Early prediction and identification of risk factors may aid in treating disease before it becomes incurable. We aimed to create a framework by incorporating the most significant features to predict Schistosomiasis using machine learning techniques. A dataset of advanced Schistosomiasis has been employed containing recovery and death cases. A total data of 4316 individuals containing recovery and death cases were included in this research. The dataset contains demographics, socioeconomic, and clinical factors with lab reports. Data preprocessing techniques (missing values imputation, outlier removal, data normalisation, and data transformation) have also been employed for better results. Feature selection techniques, including correlation-based feature selection, Information gain, gain ratio, ReliefF, and OneR, have been utilised to minimise a large number of features. Data resampling algorithms, including Random undersampling, Random oversampling, Cluster Centroid, Near miss, and SMOTE, are applied to address the data imbalance problem. We applied four machine learning algorithms to construct the model: Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting and CatBoost. The performance of the proposed framework has been evaluated based on Accuracy, Precision, Recall and F1-Score. The results of our proposed framework stated that the CatBoost model showed the best performance with the highest accuracy of (87.1%) compared with Gradient Boosting (86%), Light Gradient Boosting (86.7%) and Extreme Gradient Boosting (86.9%). Our proposed framework will assist doctors and healthcare professionals in the early diagnosis of Schistosomiasis.
- Subject
- machine learning; schistosomiasis; healthcare data; data imbalance; feature selection; data resampling; SDG 3; Sustainable Development Goals
- Identifier
- http://hdl.handle.net/1959.13/1480244
- Identifier
- uon:50466
- Identifier
- ISSN:2075-4418
- Rights
- © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
- Language
- eng
- Full Text
- Reviewed
- Hits: 2779
- Visitors: 2802
- Downloads: 33
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT02 | Publisher version (open access) | 1 MB | Adobe Acrobat PDF | View Details Download |